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Modern bibliographic databases provide the basis for scientific research and its evaluation. While 
their content and structure differ substantially, there exist only informal notions on their reliability. 
Here we compare the topological consistency of citation networks extracted from six popular bib¬ 
liographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed 
through a rich set of local and global graph statistics. We first reveal statistically significant incon¬ 
sistencies between some of the databases with respect to individual statistics. For example, the in¬ 
troduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs 
from the rest due to the coverage of the database, while the citation information within arXiv.org is 
the most exhaustive. Finally, we compare the databases over multiple graph statistics using the crit¬ 
ical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least 
consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from 
the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics 
and scientometrics or a scientific evaluation guideline for governments and research agencies. 


B ibliographic databases range from expensive hand-curated professional solutions like Web of 
Science and Scopus to preprint repositories,^ public serverP and automated services that 
collect freely accessible manuscripts from the Web.® These provide the basis for scien¬ 
tific research, where new knowledge is derived from the existing, while also the main source of 
its evaluation. Undoubtedly, the number of citations a paper receives is still considered to be the 
main indicator of its importance or relevance.!^ However, the probability distribution of scientific 
citations has been shown to follow a wide range of different forms including power-law,shifted 
power-law ,1^ stretched exponential,!^ log-normal,!I^ Tsallis,!^ and modified Bessel,!^ to name just a 
few. Although some methods used in these studies might be questionable, more importantly, they 
are based on different bibliographic data. In fact, the content and structure of modern bibliographic 
databases differ substantially, while there exist only informal notions on their reliability. 

One way to assess the databases is simply by the amount of literature they cover. Web o f Science 
spans over 100 years and includes several dozens of millions of publication records,E3Hl an extent 
similar to that of Scopus, which, however, came into existence only some ten years ago. On the 
other hand, the preprint repository arXiv.or^ and the digital library DBLP Computer Science Bibli- 
ograph^hoth date back to 1990s and include only millions of publications or publication records. 
The coverage of different bibliographic databases has els e bee n investigated by vario us scholars ,®l2l 
while others have analyzed als o thei r temporal evolution,available features, EESI data acquisition 
and maintenance methodology and the use within a typical scientific workfiow.!^ 

Yet, despite some notable differences, the reliability of bibliographic databases is primarily seen 
as the accuracy of its citation information. While citations are input by hand in the case of profes¬ 
sional databases, services like CiteSeer and Google Scholar use information retrieval and machine 
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learning techniques to automatically parse citations from publication manuscripts P® Expectedly, 
this greatly impacts bibli ometric analyseand standard metrics of scientific evaluation like citation 
counts and /i-indexP^SI Although networks of citations between scientific papers have be en stu died 
since the 1950s, I3S1 and are also commonly used in the modern network analysis literature, EES there 
exists no statistical comparison of citation topology of different bibliographic databases. 

In this study, we compare the topological consistency of citation networks extracted from six 
popular bibliographic databases (see Methods). The networks are assessed through local and global 
graph statistics by a methodology borrowed from the machine learning literature.l^ We first re¬ 
veal statistically significant inconsistencies between some of the databases with respect to individ¬ 
ual graph statistics. For example, the introduced field bow-tie decomposition of DBLP Computer 
Science Bibliography substantially differs from the rest due to the coverage of the database or the 
sampling procedure, while the citation information within arXiv.org is proven to be the most exhaus¬ 
tive. Finally, we compare the consistency of databases over multiple graph statistics. The citation 
topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not sur¬ 
prisingly, Web of Science is significantly more reliable from this perspective. Note that the reliability 
is here seen as a deviation from the majority (see Discussion). Differences between other databases 
are not statistically significant. This work can serve either as a reference for scholars in bibliometrics 
and scientometrics or a scientific evaluation guideline for governments and research agencies. 


Results 

Citation networks representing bibliographic databases are compared through 21 graph statistics de¬ 
scribed in Methods. In the following, we discuss the values of statistics in the context of complex 
network theory. Next, we reveal some statistically significant differences in individual statistics us¬ 
ing Student t-test.® We then select ten statistics whose independence is confirmed by Fisher z-tesP^ 
and show that the d atabases display significant inconsistencies in the selected statistics using Fried¬ 
man rank test.l^^^ Fast, the databases with no significant inconsistencies are revealed by Nemenyi 
post-hoc tesPSI and the critical difference diagram.^S Finally, we also compare the bibliographic 
databases with the selected online databases to verify the predictive power of the employed statisti¬ 
cal methodology. See Methods for further details on statistical comparison. 

Graph statistics of citation networks. Table [T] shows descriptive statistics of citation networks. 
The networks range from thousands of nodes to millions of links, while the largest weakly connected 
components contain almost all the nodes. This is consistent with the occurrence of a giant connected 
component in random graphs. ES Directed networks are often assessed also according to their bow-tie 
structure.!^ However, due to the acyclic nature of citation networks where papers can only cite papers 
from the past, the decomposition proves meaningless. We introduce the field bow-tie decomposition 
into the in-field component, which consists of papers citing no other paper, the out-field component, 
which consists of papers not cited by any other paper, and the field core. The out-field component 
thus includes the research front,!^ and the in-field and core components include the knowledge or 
intellectual base.^^ Table [T] shows the percentage of nodes in each of the field components, while a 
visual representation is given in Fig.[T] Notice that, in most cases, the majority of papers is included 
in the core and out-field components of the citation networks. Nevertheless, the main mass of the 
papers shifts towards the in-component in HistCite and DBFP databases (Figure [H panels D and 
E). Since the former consists of papers from merely major journals and conferences, and the latter 
is based on the bibliography of a single author, many of the papers in the databases cite no other. 
Hence, reducing a bibliographic database to only a subset of publications or authors gives notably 
different citation structure and also infiuences many common graph statistics. 

TableO shows degree statistics of citation networks. Observe that the mean degree (k) is around 
8.8 in all cases except arXiv database, which, somewhat surprisingly, coincides with the common 
density of real-world networks.® Note, however, that since (k) /2 = (kin) = {kout) for any net¬ 
work, the papers cite and are cited by only four other papers on average. This nu mber becomes 
meaningful when one considers that far more citations come from outside the field,®® whereas 
all databases are subsets of their respective fields in some sense. Considerably higher (k) in arXiv 
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database is most likely due to several reasons. In contrast to other databases, arXiv.org stores journal 
and conference papers, technical reports, draft manuscripts that never came to print etc. Next, the ci¬ 
tation network studied has been released within the KDD Cup 2003 (http : / /www. cs. corneii. edu/pro ject s/kddcup) 
and has thus presumably been cleansed appropriately. Also, the subset of arXiv.org considered con¬ 
sists of physics publications, while other databases consist of computer science publications. Re¬ 
gardless of the true reason, the citation information within arXiv database is notably more exhaustive, 
which clearly reflects in its graph structure (see held bow-tie in Fig.[T] panel F). 

Figure [T] plots degree distributions of citation networks, while the corresponding scale-free!^ 
exponents 7, 7^^ and ^out are given in Table O We stress that not all distributions, especially out- 
degree distributions, are a valid lit to a power-law form.^S Nevertheless, the degree distributions 
further confirm the inconsistencies observed above. A larger number of non-citing papers results in 
a less steep out-degree distribution, whereas ^out ~ 2.6 for HistCite and DBLP databases, while 
lout ~ 3.8 otherwise. On the contrary, the in-degree distribution of HistCite database is much 
steeper with 7^^ = 3.5, while 7^^ ~ 2.5 for the rest. In fact, 7^^ > lout for HistCite database, 
whereas 7^^ < lout for all others. Finally, the lack of low-citing papers in arXiv database prolongs 
the degree distributions towards the right-hand side of the scale (see Fig.[TJ panel F). 

Degree mixing in Table [2| reveals no particularly strong correlations. Still, the in-degree and 
out-degree mixing coefficients and r(^out,out) show positive correlation, while the undirected 

degree mixi ng r i s negative. For comparison, r ^ 0 in social networks, and r <C 0 for Internet 
and the Web.®22l Again, HistCite and DBLP databases deviate from common behaviour due to the 
reasons given above. For example, the directed degree mixing coefficient is substantially 

lower for HistCite database, while all directed coefficients are relatively low for DBLP database. 

Figure ^plots also neighbour connectivity profiles of citation networks. Notice dichotomous degree 
mixing that is positive for smaller out-degrees and negative for larger in-degrees, represented by 
increasing or decreasing trend, respectively (see, e.g., Fig.[Tl panels A and B). Similar observations 
were recently made also in softwarJ^ and undirected biological^ networks. Consistent with the 
above, these trends are not present in HistCite and DBLP databases (see Fig.[TJ panels D and E). 

Table [ 3 ] shows clustering statistics of citation networks. The mean clustering coefficients (c), 

(6) and {d) greatly vary across the databases, whereas (c) « 0.15 for WoS, CiteSeer and DBLP 
databases, and (c) ^ 0.3 in the case of Cora, HistCite and arXiv databases. This may be an artefact 
of the coverage or the sampling procedure used for c itatio n extraction, while cluste ring can also 
reflect the amount of citations copied from other paper^^^^ known as indirect citation.®^ Unbiased 
clustering mixing coefficients and ra in Table [3] reveal strong positive correlations, similar to 
other real-world networks.l^ However, as before, Vd = 0.26 for DBLP database, while Vd ~ 0.4 
for all others. Figure [T]^ts clustering profiles of citation networks. Due to degree mixing biases,!^ 

C{k) ^ k~^ for a ^ 1,E3 while this behaviour is absent from corrected profiles B{k) and D{k). 

Table [ 3 ] shows also diameter statistics of citation networks. Undirected effective diameter is 
somewhat consistent across the databases, in contrast to the directed variant (^gg, where (5gg 8.5 
for WoS, HistCite and DBLP databases, while (5gg > 20 for other databases. Low value of (5gg for 
HistCite and DBLP databases is due to the limited coverage discussed above, whereas the respective 
networks are also much smaller (see Table [T]). On the other hand, low (5gg for WoS database is due 
to a rather non-intuitive phenomena that real-world networks shrink as they grow.lSI WoS database 
includes 50 years of literature, while the time span of, e.g., arXiv database is merely 10 years. The 
databases are thus not directly comparable in (5gg and neither is indeed inconsistent with the rest. 

Described can be more clearly observed in hop plots shown in Fig. [T] (see, e.g., panels A and B). 

Comparison of databases by individual statistics. The above discussion was in many cases just 
qualitative. In the following, we reveal also statistically significant differences between some of the 
databases with respect to individual graph statistics. Since their values of a true citation network are, 
obviously, not known, we compute externally studentized residuals that measure the consistency 
of each database with the rest (Figure [H panels A-F). Statistically significant inconsistencies in 
individual statistics are revealed by independent two-tailed Student t-tests (see Methods). 

WoS, CiteSeer and Cora databases show no significant differences at P-value = 0.05. On the 
contrary, the scale-free in-degree exponent 7^^ in HistCite database is significantly higher than in 
other databases, while the directed degree mixing coefficient r(^out,in) is significantly lower (P-value = 
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0.019 and P-value = 0.033, respectively; see Tableland Fig.|2l panel D). This is a direct conse¬ 
quence of the limited coverage already noted above. For example, since the database is derived from 
a bibliography of a single author, highly cited papers are likely missing, which results in a much 
steeper citation distribution P{kin) and thus higher jin. Next, the unbiased clustering mixing coef¬ 
ficient Vd is significantly lower in DBLP database (P-value = 0.017; see Table[3]and Fig. [21 panel E). 
Apparently, reducing the bibliographic database to only selected publications gives a rather hetero¬ 
geneous citation structure, which does not share high clustering assortativity,®! Vd ^ 0 , of other 
citation networks. Note that the differences in the field bow-tie decomposition of DBLP database 
become statistically significant at P-value = 0.052 (see below). Finally, as thoroughly discussed 
above, the citation information within arXiv database is significantly more exhaustive with much 
higher mean degree (k) (P-value = 0.009; see Tableland Fig. [21 panel F). Notice that statistically 
significant inconsistencies between the databases are, expectedly, merely a subset of the differences 
exposed through the expert analysis above. Still, in summary, the results reveal that bibliographic 
databases with substantially different coverage have significantly different citation topology. 

At P-value = 0.1, several other inconsistencies become statistically significant. For CiteSeer 
database, the largest weakly connected component is significantly smaller than in other databases 
(P-value = 0.059; see Table [T] and Fig. [21 panel B); for HistCite database, the clustering mixing 
coefficient Vc is lower (P-value = 0.066; see Tableland Fig. [21 panel D); for DBLP database, 
the in-field component is larger (P-value = 0.052; see Table [Hand Fig. [21 panel E), while the 
field core and the directed degree mixing coefficient rjn^in) are smaller (P-value = 0.090 and 
P-value = 0.095, respectively; see Tableland Tabled and Fig. [21 panel E); and for arXiv database, 
the undirected degree mixing coefficient r and the corrected clustering coefficient (b) are higher 
(P-value = 0.081; see Table[21and Table[3l and Fig. [21 panel F). Note that, due to space limitations, 
not all inconsistencies at P-value = 0.1 are discussed in the analysis above. 

Selection of independent gr aph s tatistics. Since the adopted graph statistics of citation networks 
are by no means independent,cannot simply compare the bibliographic databases over all. 
For this purpose, we select ten statistics listed in Fig. [21 panel G, and verify their statistical inde¬ 
pendence (see Methods). We compute Fisher transformations of the pairwise Spearman correlations 
between the statistics, while significant correlations are revealed by independent two-tailed 2 ;-tests 
(Figure[21 panel H). Notice that no correlation is statistically significant at P-value = 0.01. 

The selection of independent graph statistics proceeds as follows. We first discard statistics that 
are sums or aggregates of the others by definition. Namely, the sizes of the largest weakly connected 
and out-field components (see Table [T]), the scale-free degree exponent 7, the undirected degree 
mixing r and also both mixed directed mixing coefficients rjn^out) ^{out in) (see Table [2]). 
We next discard statistics whose correlations have been proven in the literaturJ^ or are dependent 
on some intrinsic characteristic of the database like the time span of the publications (see above). 
Namely, the standard clustering (c) and the corresponding mixing coefficient and the directed 
effective diameter (^90 (see Table [3]). Finally, out of the both unbiased clustering coefficients (b) and 
(d), we decide for the latter, and its corresponding mixing coefficient rd (see Table [3]). We are thus 
left with ten statistics (Figure [21 panel G). Namely, the sizes of the in-field and core components 
(see Table [21), the mean degree (k), the directed scale-free exponents jin and jout, and the directed 
degree mixing coefficients rjn,in) and r(^out,out) (see Tabled, the unbiased clustering (d) and its 
corresponding mixing coefficient Vd, and the undirected effective diameter (see Table [3]). 

For some further notes on statistics independence see Discussion. 

Comparison of databases over multiple statistics. In the following, we compare the biblio¬ 
graphic databases over independent graph statistics selected above. We rank the databases according 
to the studentized statistics residuals and compute their mean ranks over all statistics (see Methods). 
The final ranks are 2.2 for WoS database, 3.1 for both CiteSeer and Cora databases, 3.6 for arXiv 
database, 4.0 for HistCite database and 5.0 for DBLP database. Notice that the ranks indeed reflect 
the conclusions on database consistency given above. We reject the null hypothesis that the ranks 
of the databases are statistically equivalent by one-tailed Friedman test at P-value = 0.05 and thus 
compare the ranks by two-tailed Nemenyi post-hoc test (Figure [21 panel I). The databases whose 
ranks differ by more than a critical distance 2.38 show statistically significant inconsistencies in the 
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selected statistics at P-value = 0.05. Hence, the citation topology of WoS database is significantly 
more reliable than that of DBLP database, which is the least consistent with the rest. On the other 
hand, the differences between other databases are not statistically significant, whereas concluding 
that these are consistent with both WoS and DBLP databases would be a statistical nonsense.E3 At 
P-value = 0.1, the critical distance drops to 2.17, while all conclusions still remain the same. In¬ 
terestingly, neglecting the requirement for the independence of graph statistics and comparing the 
bibliographic databases over all 21 statistics, again gives exactly the same conclusions on their con¬ 
sistency. Although, the ranking changes, since arXiv database is ranked in front of Cora database. 

For some further notes on database consistency see Discussion. 

Comparison of bibliographic and online databases. To assess the power of the employed statis¬ 
tical methodology for quantifying the differences in network topology, we compare citation networks 
representing different bibliographic databases with two networks extracted from online databases. 

Namely, a technolc^cal network of Gnutella peer-to-peer file sharing (http: //rfc-gnuteiia. sourceforge. net) 
from August 2002,1^ where nodes are hosts and links are shares between them; and a social network 
representing Twitter social circles (http: / / twitter. com) crawled from public repositories,!^ where 
nodes are users and links are follows between them. Both these networks are provided within SNAP 
(http: / /snap. stanford.edu), while their basic descriptive statistics are given in Table[T] 

Note that online databases reveal knowingly different network topology than reliable biblio¬ 
graphic databases. For example, the majority of nodes in Gnutella database is included in the in-field 
component (see Methods), similarly as in DBLP database (see Table[T]). Next, the mean degree (/c) is 
considerably higher in Twitter database and lower in Gnutella database (see Table [2]) . Furthermore, 
the degree distributions of Gnutella database are not a valid fit to a power-law forrrP^ with higher 
scale-free degree exponents 7 -s than in other databases (see TableO. On the contrary, the scale-free 
out-degree exponent ^out of Twitter database is lower, similarly as in HistCite database. Online 
databases also reveal notably different clustering regimes than bibliographic databases (see Table [S]). 

The standard and unbiased clustering coefficients (c) and {d) are much higher in Twitter database, 
while much lower in Gnutella database. Finally, Gnutella database shows relatively heterogeneous 
clustering structure with very low unbiased clustering mixing coefficients and Vd. 

In the following, we reveal statistically significant inconsistencies between some of the databases 
with respect to individual graph statistics (see Methods). We consider the online databases and four 
most reliable bibliographic databases so that all critical values remain the same as before. Under this 
setting, the bibliographic databases show no inconsistencies at P-value = 0.05 (Figure O panels 
A-D). On the other hand, five most significant inconsistencies of online databases almost precisely 
coincide with the differences exposed through the analysis above (Figure [S] panels E and F). For 
Gnutella database, the in-field component is larger (P-value = 0.008), the degree and in-degree 
scale-free exponents 7 and 7 ^^ are higher (P-value = 0.011 and P-value = 0.008, respectively), 
and the unbiased clustering mixing coefficients and Vd are lower (P-value = 0.032 and P-value = 

0.011, respectively); and for Twitter database, the mean degree (k) is higher (P-value = 0.039), the 
out-degree scale-free exponent ^out and the directed degree mixing coefficient are lower 

(P-value = 0.063 and P-value = 0.066, respectively), and the standard and unbiased clustering 
coefficients (c) and (d) are higher (P-value = 0.056 and P-value = 0.065, respectively). 

In the remaining, we also rank the databases over multiple graph statistics as before (see Meth¬ 
ods). We select ten statistics listed in Fig.[3l panel G, whose pairwise independence is confirmed at 
P-value = 0.001 (Figure [3l panel H). The overall ranks of the databases are not statistically equiv¬ 
alent at P-value = 0.05 and are given in Fig.[3l panel I. Expectedly, the online databases are the 
least consistent with the rest, whereas the ranks are 4.6 and 4.9 for Gnutella and Twitter databases, 
respectively, and 1.9-3.3 for the bibliographic databases. Yet, merely WoS bibliographic database 
significantly differs from the online databases at P-value = 0.05 (see Fig.fS) panel I). 

In summary, the employed statistical testing proves to be rather effective in quantifying the in¬ 
consistencies between network databases with respect to individual graph statistics. On the contrary, 
the comparison over multiple statistics appears to be less powerful and cannot distinguish between 
the online databases and all bibliographic databases considered above. Nevertheless, the statistically 
significant inconsistencies between WoS and DBLP bibliographic databases highlighted in the study 
can thus indeed be regarded as rather substantial. 
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Discussion 


We conduct an extensive statistical analysis of the citation information within six popular biblio¬ 
graphic databases. We extract citation networks and compare their topological consistency through 
a large number of graph statistics. We expose statistically significant inconsistencies between some 
of the databases with respect to individual graph statistics and compare the databases over multiple 
statistics. DBLP Computer Science Bibliography is found to be the least consistent with the rest, 
while Web of Science is significantly more reliable from this perspective. The result is somewhat 
surprising, since DBLP Computer Science Bibliography is informally considered as one of the most 
accurate freely available sources of computer science literature. The analysis further reveals that the 
coverage of the database and the time span of the literature greatly affect the overall citation topol¬ 
ogy, although this can be avoided in the case of the latter. This work can serve either as a reference 
for the analyses of citation networks in bibliometrics and scientometrics literature or a guideline for 
scientific evaluation based on some particular bibliographic database or literature coverage policy. 

We introduce the field bow-tie decomposition of a citation network (see Methods), which proves 
to be one of the most discriminative approaches for comparing the citation topology of bibliographic 
databases (see Results). We also consider 18 other local and global graph statistics. Nevertheless, we 
neglect some possible common patterns of nodes like motif and graphlets,!^ and the occurrence 
of larger characteristic groups of nodes like communities^ and modules.l^ Yet, these structures are 
not well understood for the specific case of citations networks and thus not easily interpretable. 

In the following, we provide some further notes on the representativeness and reliability of the 
bibliographic databases, and the independence of the databases and adopted graph statistics. 

As discussed in Methods, citation networks extracted from bibliographic databases are not nec¬ 
essarily representative due to citation retrieval procedure, data preprocessing techniques, size or 
other. It should, however, be noted that this work has been done after realizing that citation networks 
available from the Web provide a rather inconsistent view on the structure of bibliographic informa¬ 
tion. We have therefore collected and compared all such networks, while including also a citation 
network extracted from Web of Science. In that sense, the adopted networks are repres entati ve of 
the data readily available for the analyses and thus also commonly used in the literature .1^^^ Still, 
other citation networks could give different conclusions on the reliability of bibliographic databases. 
In particular because the reliability is measured through consistency of the databases. The concepts 
are of course not equivalent, yet the study reveals that, in most cases, only a single database deviates 
from a common behaviour for some particular graph statistic (see Results). Hence, the reliability 
can indeed be seen as a deviation from the majority to a rather good approximation. 

Independence between bibliographic databases is obtained trivially, since these are either based 
on independent bibliographic sources or cover different literature (see Meth ods). On the other hand, 
adopted graph statistics of citation networks are by no means independent.!^®! As this is required 
by several statistical tests, we reduce the statistics to a subset whose pairwise independence could be 
proven. Nevertheless, we only show that the statistics are not clearly dependent and we do not ensure 
their mutual independence. Although the conclusions of the study are exactly the same regardless of 
whether it is based on all or merely independent statistics (see Results), further reducing the subset 
of statistics would discard relevant information and no statistically significant conclusions could 
be made. We also stress that all results have been verified by an independent expert analysis. An 
alternative solution would be to transform the statistics into uncorrelated representatives using matrix 
factorization techniques like principal component analysis.!^ However, interpreting inconsistencies 
in, e.g., O.Qyin — l-4rc + 0 . 8(^90 would most likely be far from trivial. 


Methods 


Bibliographic sources 

In this study, we conduct a network-based comparison of citation topology of six bibliographic databases. These 
have been extracted from publicly available and commercial bibliographic sources, services, software and a 
preprint repository wi th py ticular focus on computer science publications. For bibliographic sources based on 
a similar methodolog}^^!] g ^ of Science and Scopus, CiteSeer and Google Scholar), a single exemplar 
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has been selected. We have extracted a citation network from each of the selected databases. Publications 
neither citing nor cited by any other are discarded and any self-citations that occur due to errors in the databases 
are removed prior to the analysis (see below and Table [T] for details). Although the databases contain fair 
portions of the respective bibliographic sources, we stress that they are not all necessarily representative. Still, 
in most cases, these are the only examples of citation net works readily available online (due to our knowledge) 
and thus also often used in the network analysis literature.^^^^ 

WoS database. Web of Science (WoS) is informally considered as the most accurate bibliographic source in 
the world. It is hand-maintained by professional staff at Thomson Reu ters (http: //thomsonreuters . com), 
previously Institute for Scientific Information. It dates back to the 195oP^and contains over 45 million records 
of publications from all fields of science.^ For this study, we consider all journal papers in WoS category 
Computer Science, Artificial Intelligence as of October 2013. The extracted database spans 50 years, and 
contains 179,510 papers from 877 journals and 639,126 citations between them. Note that 39,148 papers 
neither cite nor are cited by any other, while the database includes 16 self-citations. 


CiteSeer database. CiteSeer or CiteSeer^ (CiteSeer) is constructed by automatically crawling the Web for 
freely accessible manuscripts of publications and then analyzing the latter for potential citations to other publi- 
cationP( http ://citeseer.ist.psu.edu). It became publicly available in 1998 and is maintained by Penn¬ 
sylvania State University. It contains over 32 million publication records from computer and information sci¬ 
ence.^ For this study, we consider a snapshot of the database provided within KONECT (http: //konect. uni-koblenz . de) 
that contains 723,131 publications and 1,751,492 citations between them. Note that 338,718 publications nei¬ 
ther cite nor are cited by any other, while the database includes 6,873 self-citations. 

Cora database. Computer Science Research Paper Search Engine (Cora) is a service for automatic retrieval 
of publication manuscripts from the Web using machine learning techniques^ ( http://people.cs. umas s . edu/ ~mccallum). 
It contains over 50,000 publication records collected from the websites of computer science departments at ma¬ 
jor universities in August 1998. For this study, we consider a subset of the database that contains 23,166 pub¬ 
lications and 91,500 citations between thenrP^(http : //lovro . ipt. fri . uni-l j .si). Note that all papers 
either cite or are cited by some other, while the database includes no self-citations. 


HistCite database. Algorithmic Historiography (HistCite) is a software package for analysis and visualiza¬ 
tion of bibliographic databases owned by Thomson Reuters (http: //www .histcite.com). It was developed 
in the 2000s for extracting publication records from WoS database.^ For this study, we consider a complete 
bibliography of Nobel laureate Joshua Lederberg produced by HistCite in February 2008. The database con¬ 
tains 8,843 publications and 41,609 citations between them (http://vlado.fmf.uni-lj.si). Note that 
4,519 publications neither cite nor are cited by any other, while the database includes 14 self-citations. 

DBLP database. DEEP Computer Science Bibliography (DBLP) indexes major journals and proceedings 
from all fields of computer scienc^ (http://dblp. uni-trier. de). It is freely available since 1993 and 
hand-maintained by University of Trier. It contains more than 2.3 million records of publications, while the 
citation information is extremely scarce compared to WoS and CiteSeer databases.^ For this study, we con¬ 
sider a snapshot of the database provided within KONECT (http: / /konect. uni-koblenz . de) that contains 
12,591 journal and conference papers, and 49,759 citations between them. Note that all papers either cite or 
are cited by some other, while the database includes 15 self-citations. 

arXiv database. arXiv.org (arXiv) is a public preprint repository of publication drafts uploaded by the au¬ 
thors prior to an actual journal or conference submission (http://arxiv.org). It began in 1991^ and is 
hosted at Cornell University. It currently contains almost one million publications from physics, mathematics, 
computer science and other fields. Eor this study, we consider all publications in arXiv category High En¬ 
ergy Physics Phenomenology as of April 200d^provided within SNAP ( http: / / snap. Stanford. edu). The 
database spans over 10 years, and contains 34,546 publications and 421,578 citations between them. Note that 
all publications either cite or are cited by some other, while the database includes 44 self-citations. 


Citation topology 

Citation networks extracted from bibliographic databases are represented with directed graphs, where papers 
are nodes of the graph and citations are directed links between the nodes. The topology of citation networks is 
assessed through a rich set of local and global graph statistics. 
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Descriptive and field statistics. The citation network is a simple directed graph G(y, L), where V is the set 
of nodes, n = |y |, and L is the set of links, m = \L\. Weakly connected component (WCC) is a subset of 
nodes reachable from one another not considering the directions of the links. Field bow-tie is a decomposition 
of the largest WCC of a citation network into the in-field component, which consists of nodes with no outgoing 
links, the out-field component, which consists of nodes with no incoming links, and the field core. 


Degree distributions and mixing. The in-degree kin or out-degree kout of a node is the number of incoming 
and outgoing links, respectively, k is the degree of a node, k = kin + kout, and (k) denotes the mean degree. 
7 is the scale-free exponent of a power-law degree distribution P{k) ~ k~^, and 'yin and 7 ont are the scale- 
free exponents of P{kin) and P(fco«t)P Power-laws are fitted to the tails of the distributions by maximum- 
likelihood estimation, 7 . = 1 + n In k.jkmin ) ^ for kmin C {10, 25}. Neighbour connectivity plots 
show the mean neighbour degree N{k.) of nodes with degree k.W^ The degree mixing the Pearson 

correlation coefficient of a-degrees or /3-degrees at links’ source and target nodes, respectively!^ 

^(a,/3) = - - -^ {kc - {ka)) {k(3 - {kp)) , (1) 

O’koc^k/^ ^ 

where (/c.) and are the means and standard deviations, a, /3 G {m, out), r is the mixing of degrees kW^ 


Clustering distributions and mixing. Node clustering coefficient c is the density of its neighbourhoodP^ 


- 

k{k — 1) ’ 


( 2 ) 


where t is the number of linked neighbours and k{k — l )/2 is the maximum possible number, c = 0 for 
k < 1. The mean (c) is denoted network clustering coefficient,^ while the clustering mixing Tc is defined as 
before. Clustering profile shows the mean clustering C{k) of nodes with degree /c.^Note that the denominator 
in equation m introduces biases,^ particularly when r < 0. Thus, delta-corrected clustering coefficient b is 
defined as c • /c/A,where A is the maximal degree k and 5 = 0 for /c < 1. Also, degree-corrected clustering 
coefficient d is defined as where uj is the maximum number of linked neighbours with respect to their 

degrees k and d = 0 for k < 1. By definition, b < c < d. 


Diameter statistics. Hop plot shows the percentage of reachable pairs of nodes H(6) within 6 hops.^ The 
diameter is the minimal number of hops S for which H(S) = 1 , while the effective diameter (^go is defined 
as the number of hops at which 90% of such pairs of nodes are reachable,^ iT(( 39 o) = 0.9. 6' denotes the 
respective number of hops in a corresponding undirected graph. Hop plots are estimated over 100 realizations 
of the approximate neighbourhood function with 32 trials.!^ 


Statistical comparison 

Citation networks representing bibliograph ic da tabases are compared through 21 graph statistics introduced 
above. These are by no means independent,^^^^ neither are their values of a true citation network known. We 
thus compute externally studentized residuals of graph statistics that measure the consistency of each biblio¬ 
graphic database with the rest. Statistically significant inconsistencies in individual graph statistics are revealed 
by Student t-test.^ We select ten graph statistics whose pairwise independence is verified using Fisher z-trans- 
formation.^ Friedman rank tesP^ confirms that bibliographic databases display significant inconsist encie s in 
the selected statistics, while the databases with no significant differences are revealed by Nemenyi test.*^^ 


Studentized statistics residuals. Denote Xij to be the value of j-th graph statistic of z-th bibliographic data¬ 
base, where N is the number of databases, A = 6 . Corresponding externally studentized residual Xij is: 


Xij — 


P'ij 

&ij 1/1 - l/N ’ 


(3) 


where and aij are the sample mean and corrected standard deviation excluding the considered z-th database, 

Azj = Hk^i ^kj/{N - 1) and = ^k^i{xkj - l^ijY/{N - 2). Assuming that the errors in x are inde¬ 
pendent and normally distributed, the residuals x have Student t-distribution with N — 2 degrees of freedom. 
Significant differences in individual statistics x are revealed by independent two-tailed Student t-testP^ at 
P-value = 0.05, rejecting the null hypothesis Hq that x are consistent across the databases, Hq \ x = f). No¬ 
tice that the absolute values of individual residuals \x \ imply a ranking R over the databases, where the database 
with the lowest \x\ has rank one, the second one has rank two and the one with the largest \x\ has rank N. 
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Pairwise statistics independence. Denote rij to be the Pearson product-moment correlation coefficient of 
the residuals x for z-th and j-ih graph statistics over all bibliographic databases. Spearman rank correlation 
coefficient pij is defined as the Pearson coefficient of the ranks R for z-th and j-ih statistics. Under the null 
hypothesis of statistical independence of z-th and j-ih statistics, Ho : pij = 0, adjusted Fisher transformation 


y/N — 3 ^ 

2 ^ 1 - nj 


(4) 


approximately follows a standard normal distribution. Pairwise independence of the selected graph statistics is 
thus confirmed by independent two-tailed z-tests at P-value = 0.01. 


Comparison of bibliographic databases. Significant inconsistencies between bibliographic databases are 
exposed using the methodology introduced for comparing classification algorithms over multiple data sets.^ 
Denote Ri to be the mean rank of z-th database over the selected graph statistics, Ri = Rij / K, where 
K is the number of statistics, K = 10. One-tailed Friedman rank tesP^ first verifies the null hypothesis 
that the databases are statistically equivalent and thus their ranks Ri should equal. Ho Ri — Rj . Under the 
assumption that the selected statistics are indeed independent, the Friedman testing statisticP^ 


12K 2 

iV(iV+l) 4 J 


(5) 


has -distribution with — 1 degrees of freedom. By rejecting the hypothesis at P-value = 0.05, we proceed 
with the Nemenyi post-hoc test that reveals databases whose ranks Ri differ more than the critical differenceP^ 


q 


N(N3-1) 


6K 


( 6 ) 


where q is the critical value based on the studentized range statistic,^ g = 2.85 at P-value = 0.05. A critical 
difference diagram plots the databases with no statistically significant inconsistencies in the selected statistics.^ 
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Figure Legends 


Figure 1: Profile of citation networks extracted from bibliographic databases. Panels A-F show different 
distributions, plots and profiles of citation networks extracted from bibliographic databases. These are (from left 
to right): the field bow-tie decompositions, where the arrows illustrate the direction of the links and the areas 
of components are proportional to the number of nodes contained; the degree, in-degree and out-degree distribu¬ 
tions P{k), P{kin) and P{kout), respectively; the corresponding neighbour connectivity plots N{k), N(kin) and 
N {k out) \ the clustering profiles of the standard and both unbiased coefficients C{k), B{k) and D{k), respectively; 
and the hop plots for the standard and undirected diameters 6 and 6', respectively (see Methods). 


Figure 2: Comparison of bibliographic databases through statistics of citation networks. Panels A-F show 
studentized statistics residuals of citation networks extracted from bibliographic databases. The residuals are listed 
in decreasing order, while the shaded regions are 95% and 99% confidence intervals of independent Student t- 
tests (labelled with respective P-values). Panel G shows the residuals of merely independent statistics, where the 
shaded region is 95% confidence interval. Panel H shows pairwise Spearman correlations of independent statistics 
listed in the same order as in panel G (left) and the P-values of the corresponding Fisher independence z-tests 
(right). Panel I shows the critical difference diagram of Nemenyi post-hoc test for the independent statistics. The 
diagram illustrates the overall ranking of the databases, where those connected by a thick line show no statistically 
significant inconsistencies at P-value = 0.05 (see Methods). 


Figure 3: Comparison of bibliographic and online databases through statistics of networks. Panels A-D 
show studentized statistics residuals of citation networks extracted from bibliographic databases, while panels E 
and F show residuals of social and technological networks extracted from online databases. The residuals are 
listed in decreasing order, while the shaded regions are 95% and 99% confidence intervals of independent Student 
t-tests (labelled with respective P-values). Panel G shows the residuals of merely independent statistics, where the 
shaded region is 95% confidence interval. Panel H shows pairwise Spearman correlations of independent statistics 
listed in the same order as in panel G (left) and the P-values of the corresponding Fisher independence z-tests 
(right). Panel I shows the critical difference diagram of Nemenyi post-hoc test for the independent statistics. The 
diagram illustrates the overall ranking of the databases, where those connected by a thick line show no statistically 
significant inconsistencies at P-value = 0.05 (see Methods). 



Tables 



Descriptive statistics 

Field decomposition 

Source 

# Nodes 

# Links 

% WCC 

% In-held 

% Core 

% Out-held 

WoS 

140,362 

639,110 

97.0% 

11 . 2 % 

51.4% 

34 . 4 % 

CiteSeer 

384,413 

1,744,619 

95.0% 

10 . 5 % 

37.7% 

46.8% 

Cora 

23,166 

91,500 

100.0% 

8 . 5 % 

51.4% 

40.1% 

HistCite 

4,324 

41,595 

98.7% 

44 . 8 % 

52.2% 

1.6% 

DBLP 

12,591 

49,744 

99.2% 

74 . 5 % 

16.9% 

7.8% 

arXiv 

34,546 

421,534 

99.6% 

6 . 7 % 

74.7% 

18.1% 

Gnutella 

62,586 

147,892 

100.0% 

73 . 8 % 

25.7% 

0.5% 

Twitter 

81,306 

1,768,135 

100.0% 

13.8% 

86.2% 

0.0% 


Table 1: Descriptive statistics and field decompositions of citation and other networks. Respective biblio¬ 
graphic or online databases are given under the column denoted by “Source”. Descriptive statistics list the number 
of network nodes n and links m, and the percentage of nodes in the largest weakly connected component (column 
labelled “% WCC”). Columns labelled “% In-held”, “% Core” and “% Out-held” report the percentages of nodes 
in each of the components of the held bow-tie decomposition (see Methods). 


Degree distributions Degree mixing 


Source 

(fc) 

7 

7m 

7owt 

r 

'^{indn) 

'^{in^out) 

'^{outdn) 

'^{out^out) 

WoS 

9.11 

2.74 

2.39 

3.88 

-0.06 

0.04 

- 0.02 

-0.03 

0.09 

CiteSeer 

9.08 

2.65 

2.28 

3.82 

-0.06 

0.05 

0.00 

0.00 

0.12 

Cora 

7.90 

2.88 

2.60 

4.00 

-0.06 

0.07 

0.02 

0.00 

0.17 

HistCite 

9.99 

2.55 

3.50 

2.37 

-0.10 

0.11 

0.01 

-0.13 

0.00 

DBLP 

7.90 

2.42 

2.64 

2.75 

-0.05 

0.00 

-0.02 

-0.05 

-0.02 

arXiv 

24.40 

2.67 

2.54 

3.45 

-0.01 

0.08 

-0.04 

0.00 

0.11 

Gnutella 

4.73 

6.37 

7.59 

4.78 

-0.09 

0.03 

0.01 

-0.01 

0.00 

Twitter 

43.49 

2.05 

2.31 

2.37 

-0.03 

0.00 

0.06 

-0.02 

0.06 


Table 2: Degree distributions and mixing of citation and other networks. Respective bibliographic or online 
databases are given under the column denoted by “Source”. Degree distributions are represented by the mean 
network degree {k) and the scale-free exponents of the power-law degree, in-degree and out-degree distributions 
(columns labelled “7”, “ 7 in” and “ 7 out”, respectively). Degree mixing statistics list the undirected mixing coeffi¬ 
cient r and four directed degree mixing coefficients a, /3 G {m, out} (see Methods). 


Clustering distributions Clustering mixing Diameter statistics 


Source 

(c) 


{d) 

Tc 

n 

Td 

dgo 

^90 

WoS 

0.14 

0.08- 

10-^ 

0.16 

0.16 

0.43 

0.36 

8.85 ±0.01 

7.79 ± 0.03 

CiteSeer 

0.18 

0.07- 

10“^ 

0.21 

0.14 

0.44 

0.40 

28.57 ±0.23 

9.01 ±0.04 

Cora 

0.27 

0.46- 

10-2 

0.32 

0.17 

0.50 

0.40 

21.12 ±0.16 

8.17 ±0.03 

HistCite 

0.31 

0.20- 

10-2 

0.36 

0.05 

0.36 

0.41 

7.97 ±0.03 

7.22 ±0.04 

DBLP 

0.12 

0.14- 

10-2 

0.14 

0.10 

0.35 

0.26 

9.13 ±0.07 

6.24 ±0.02 

arXiv 

0.28 

0.64- 

10-2 

0.33 

0.13 

0.46 

0.39 

21.71 ±0.12 

6.04 ±0.02 

Gnutella 

0.01 

0.03- 

10-2 

0.01 

0.09 

0.25 

0.17 

12.83 ±0.11 

7.70 ±0.01 

Twitter 

0.57 

0.35- 

10-2 

0.63 

0.09 

0.54 

0.40 

6.90 ±0.02 

5.50 ±0.01 


Table 3: Clustering and diameter statistics of citation and other networks. Respective bibliographic or online 
databases are given under the column denoted by “Source”. Clustering distributions are represented by the means 
of the standard and unbiased clustering coefficients (columns labelled “(c)”, “(6)” and “(d)”, respectively). Clus¬ 
tering mixing statistics list the corresponding mixing coefficients Cc, and Vd. Diameter statistics report the means 
and s.e.m. of the standard and undirected effective diameters (columns labelled “dgo” and “dgo”, respectively). 
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